definition 7
Blameless Users in a Clean Room: Defining Copyright Protection for Generative Models
Are there any conditions under which a generative model's outputs are guaranteed not to infringe the copyrights of its training data? This is the question of "provable copyright protection" first posed by Vyas, Kakade, and Barak (ICML 2023). They define near access-freeness (NAF) and propose it as sufficient for protection. This paper revisits the question and establishes new foundations for provable copyright protection -- foundations that are firmer both technically and legally. First, we show that NAF alone does not prevent infringement. In fact, NAF models can enable verbatim copying, a blatant failure of copy protection that we dub being tainted. Then, we introduce our blameless copy protection framework for defining meaningful guarantees, and instantiate it with clean-room copy protection. Clean-room copy protection allows a user to control their risk of copying by behaving in a way that is unlikely to copy in a counterfactual clean-room setting. Finally, we formalize a common intuition about differential privacy and copyright by proving that DP implies clean-room copy protection when the dataset is golden, a copyright deduplication requirement.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
A Theoretical Analysis of Compositional Generalization in Neural Networks: A Necessary and Sufficient Condition
Compositional generalization is a crucial property in artificial intelligence, enabling models to handle novel combinations of known components. While most deep learning models lack this capability, certain models succeed in specific tasks, suggesting the existence of governing conditions. This paper derives a necessary and sufficient condition for compositional generalization in neural networks. Conceptually, it requires that (i) the computational graph matches the true compositional structure, and (ii) components encode just enough information in training. The condition is supported by mathematical proofs. This criterion combines aspects of architecture design, regularization, and training data properties. A carefully designed minimal example illustrates an intuitive understanding of the condition. We also discuss the potential of the condition for assessing compositional generalization before training. This work is a fundamental theoretical study of compositional generalization in neural networks.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Long, Jiangxuan, Song, Zhao, Yang, Chiwun
Recent studies suggest utilizing generative models instead of traditional auto-regressive algorithms for time series forecasting (TSF) tasks. These non-auto-regressive approaches involving different generative methods, including GAN, Diffusion, and Flow Matching for time series, have empirically demonstrated high-quality generation capability and accuracy. However, we still lack an appropriate understanding of how it processes approximation and generalization. This paper presents the first theoretical framework from the perspective of flow-based generative models to relieve the knowledge of limitations. In particular, we provide our insights with strict guarantees from three perspectives: $\textbf{Approximation}$, $\textbf{Generalization}$ and $\textbf{Efficiency}$. In detail, our analysis achieves the contributions as follows: $\bullet$ By assuming a general data model, the fitting of the flow-based generative models is confirmed to converge to arbitrary error under the universal approximation of Diffusion Transformer (DiT). $\bullet$ Introducing a polynomial-based regularization for flow matching, the generalization error thus be bounded since the generalization of polynomial approximation. $\bullet$ The sampling for generation is considered as an optimization process, we demonstrate its fast convergence with updating standard first-order gradient descent of some objective.
- North America > United States (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Expected Variational Inequalities
Zhang, Brian Hu, Anagnostides, Ioannis, Tewolde, Emanuel, Berker, Ratip Emin, Farina, Gabriele, Conitzer, Vincent, Sandholm, Tuomas
Variational inequalities (VIs) encompass many fundamental problems in diverse areas ranging from engineering to economics and machine learning. However, their considerable expressivity comes at the cost of computational intractability. In this paper, we introduce and analyze a natural relaxation -- which we refer to as expected variational inequalities (EVIs) -- where the goal is to find a distribution that satisfies the VI constraint in expectation. By adapting recent techniques from game theory, we show that, unlike VIs, EVIs can be solved in polynomial time under general (nonmonotone) operators. EVIs capture the seminal notion of correlated equilibria, but enjoy a greater reach beyond games. We also employ our framework to capture and generalize several existing disparate results, including from settings such as smooth games, and games with coupled constraints or nonconcave utilities.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Partial Identifiability and Misspecification in Inverse Reinforcement Learning
Skalse, Joar, Abate, Alessandro
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer $R$ from $\pi$, an IRL algorithm must have a *behavioural model* of how $\pi$ relates to $R$. However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature. We also provide necessary and sufficient conditions that describe precisely how the observed demonstrator policy may differ from each of the standard behavioural models before that model leads to faulty inferences about the reward function $R$. In addition to this, we introduce a cohesive framework for reasoning about partial identifiability and misspecification in IRL, together with several formal tools that can be used to easily derive the partial identifiability and misspecification robustness of new IRL models, or analyse other kinds of reward learning algorithms.
- North America > Canada > Quebec > Montreal (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (8 more...)
A Methodology for Gradual Semantics for Structured Argumentation under Incomplete Information
Rago, Antonio, Vasileiou, Stylianos Loukas, Toni, Francesca, Son, Tran Cao, Yeoh, William
Gradual semantics have demonstrated great potential in argumentation, in particular for deploying quantitative bipolar argumentation frameworks (QBAFs) in a number of real-world settings, from judgmental forecasting to explainable AI. In this paper, we provide a novel methodology for obtaining gradual semantics for structured argumentation frameworks, where the building blocks of arguments and relations between them are known, unlike in QBAFs, where arguments are abstract entities. Differently from existing approaches, our methodology accommodates incomplete information about arguments' premises. We demonstrate the potential of our approach by introducing two different instantiations of the methodology, leveraging existing gradual semantics for QBAFs in these more complex frameworks. We also define a set of novel properties for gradual semantics in structured argumentation, discuss their suitability over a set of existing properties. Finally, we provide a comprehensive theoretical analysis assessing the instantiations, demonstrating the their advantages over existing gradual semantics for QBAFs and structured argumentation.
Sum-of-squares lower bounds for Non-Gaussian Component Analysis
Diakonikolas, Ilias, Karmalkar, Sushrut, Pang, Shuo, Potechin, Aaron
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset. Specifically, given i.i.d.\ samples from a distribution $P^A_{v}$ on $\mathbb{R}^n$ that behaves like a known distribution $A$ in a hidden direction $v$ and like a standard Gaussian in the orthogonal complement, the goal is to approximate the hidden direction. The standard formulation posits that the first $k-1$ moments of $A$ match those of the standard Gaussian and the $k$-th moment differs. Under mild assumptions, this problem has sample complexity $O(n)$. On the other hand, all known efficient algorithms require $\Omega(n^{k/2})$ samples. Prior work developed sharp Statistical Query and low-degree testing lower bounds suggesting an information-computation tradeoff for this problem. Here we study the complexity of NGCA in the Sum-of-Squares (SoS) framework. Our main contribution is the first super-constant degree SoS lower bound for NGCA. Specifically, we show that if the non-Gaussian distribution $A$ matches the first $(k-1)$ moments of $\mathcal{N}(0, 1)$ and satisfies other mild conditions, then with fewer than $n^{(1 - \varepsilon)k/2}$ many samples from the normal distribution, with high probability, degree $(\log n)^{{1\over 2}-o_n(1)}$ SoS fails to refute the existence of such a direction $v$. Our result significantly strengthens prior work by establishing a super-polynomial information-computation tradeoff against a broader family of algorithms. As corollaries, we obtain SoS lower bounds for several problems in robust statistics and the learning of mixture models. Our SoS lower bound proof introduces a novel technique, that we believe may be of broader interest, and a number of refinements over existing methods.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
A More Unified Theory of Transfer Learning
Hanneke, Steve, Kpotufe, Samory
Domain Adaptation or Transfer Learning refer generally to the problem of harnessing data from a source distribution P to improve prediction performance w.r.t. to a target distribution Q for which some or no data is available. This problem has been researched over the last few decades with a recent resurgence in interest driven by modern applications that are often characterized by a scarcity of perfect target data. A fundamental question in the theory of domain adaptation (and variant problems on distribution shifts) is how to measure the relatedness between source P and target Q distributions. Importantly, desired measures of relatedness should not only tightly capture the predictive informationP has onQ, but have to be practically useful: that is, either the measure can be estimated from data to facilitate algorithmic design, or more generally, it should somehow admit adaptive procedures, i.e., procedures whose performance is adaptive to the a priori unknown level of relatedness between P and Q. Many notions have been proposed over the last few decades, starting with the seminal works of Mansour et al. [2009], Ben-David et al. [2010] on refinements of total-variation for domain adaptation in classification, to more recent proposals for domain adaptation in regression, e.g., Wasserstein distances Redko et al. [2017], Shen et al. [2018], or measures relating covariance structures across P and Q as in Mousavi Kalan et al. [2020], Zhang et al. [2022b], Ge et al. [2023]. These various notions of relatedness appear hard to compare at first glance, leading to a disparate theory of domain adaptation at present with no unified set of principles. Interestingly as we show, upon closer look at the existing literature--whether in classification or regression--it turns out that in fact, many seemingly distinct measures of relatedness proposed in domain adaptation actually implicitly bound the same fundamental quantities: we refer to these quantities as weak and strong moduli of transfer, and they roughly measure how fast the Q-risk of predictors decrease as their P -risk decreases. These moduli always yield as tight or tighter rates of transfer than many existing notions, while also admitting adaptive procedures in general settings, as shown via a reduction to the existence of certain confidence sets for the prediction problem at hand. These reductions, while of a theoretical nature, yield insights on general adaptive transfer approaches that are less tied to specific measures of relatedness between source P and target Q.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)